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Abstract — We consider the design of cognitive Medium Access 
Control (MAC) protocols enabling an unlicensed (secondary) 
transmitter-receiver pair to communicate over the idle periods 
of a set of licensed channels, i.e., the primary network. The 
objective is to maximize data throughput while maintaining the 
synchronization between secondary users and avoiding interfer- 
ence with licensed (primary) users. No statistical information 
about the primary traffic is assumed to be available a-priori to 
the secondary user. We investigate two distinct sensing scenarios. 
In the first, the secondary transmitter is capable of sensing all 
the primary channels, whereas it senses one channel only in 
the second scenario. In both cases, we propose MAC protocols 
that efficiently learn the statistics of the primary traffic on- 
line. Our simulation results demonstrate that the proposed blind 
protocols asymptotically achieve the throughput obtained when 
prior knowledge of primary traffic statistics is available. 

I. Introduction 

Most of licensed spectrum resources are under-utilized. 
This observation has encouraged the emergence of dynamic 
and opportunistic spectrum access concepts, where unlicensed 
(secondary) users equipped with cognitive radios are allowed 
to opportunistically access the spectrum as long as they do 
not interfere with licensed (primary) users. To achieve this 
goal, secondary users must monitor the primary traffic in 
order to identify spectrum holes or opportunities which can 
be exploited to transfer data [1]. 

The main goal of a cognitive MAC protocol is to sense the 
radio spectrum, detect the occupancy state of different primary 
spectrum channels, and then opportunistically communicate 
over unused channels (spectrum holes) with minimal interfer- 
ence to the primary users. Specifically, the cognitive MAC pro- 
tocol should continuously make efficient decisions on which 
channels to sense and access in order to obtain the most benefit 
from the available spectrum opportunities. Several cognitive 
MAC protocols have been proposed in previous studies. For 
example, in [2], MAC protocols were constructed assuming 
each secondary user is equipped with two transceivers, a 
control transceiver tuned to a dedicated control channel and 
a software defined radio SDR-based transceiver tuned to 
any available channels to sense, receive, and transmit sig- 
nals/packets. On the other hand, [3] proposed a sensing-period 
optimization mechanism and an optimal channel-sequencing 
algorithm, as well as an environment adaptive channel-usage 
pattern estimation method. 

The slotted Markovian structure for the primary network 
traffic, adopted here, was also considered in [5] where the 



optimal policy was characterized and a simple greedy policy 
for secondary users was constructed. The authors of [5], how- 
ever, assumed that the primary traffic statistics (i.e., Markov 
chain transition probabilities) were available a-priori to the 
secondary users. Here, our focus is on the blind scenario 
where the cognitive MAC protocol must learn the transition 
probabilities on-line. 

In this work, we differentiate between two scenarios. The 
first assumes that the secondary transmitter can sense all the 
available primary channels before making the decision on 
which one to access. The secondary receiver, however, does 
not participate in the sensing process and can wait to decode 
on only one channel. This is the model adopted in [4]. In 
the sequel, we propose an efficient algorithm that optimizes 
the on-line learning capabilities of the secondary transmitter 
and ensures perfect synchronization between the secondary 
pair. The proposed protocol does not assume a separate 
control channel, and hence, piggybacks the synchronization 
information on the same data packet. Our numerical results 
demonstrate the superiority of the proposed protocol over the 
one in [4] where the primary transmitter and receiver are 
assumed to access the channel in a predetermined sequence, 
which they agreed upon a-priori. 

The second scenario assumes that both the secondary trans- 
mitter and receiver can sense only one primary channel in 
each time slot. This problem can be re-casted as a restless 
multi-armed bandit problem where the optimal algorithm must 
strike a balance between exploration and exploitation [8]. 
Unfortunately, finding the optimal solution for this problem 
remains an elusive task [10]. Inspired by the recent results 
of [8] and [9], an efficient MAC protocol is constructed which 
can be viewed as the Whittle index strategy of [8] augmented 
with a similar learning phase to the one proposed in [9] for 
the multi-armed bandit scenario. Our numerical results show 
that the performance of this protocol converges to the Whittle 
index strategy with known transition probabilities [8]. 

II. Network Model 

A. Primary Network 

We consider a primary network consisting of N indepen- 
dent channels with its users communicating according to a 
synchronous slot structure. We use i to refer to the channel 
index i E {1, • • • , N}, and j to refer to the time-slot index 
j G {!,■■■ ,T}. The ith primary channel has a bandwidth 




Fig. 1. The Gilber-Elliot channel model 



of Bi. The traffic statistics of the primary network are such 
that the occupancy of each of the N channels follows a 
discrete-time Markov process with two states. The state of 
the ith channel at time slot j, S 1 - , is equal to 1 if the 
channel is free, and to if it is busy. The state diagram for 
a single Markov channel model is illustrated in Figure 1. The 
channel state transition matrix of the Markov chain is given 
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We assume that P l remains fixed 



for a block of T time slots and is unknown a-priori to the 
secondary user. 

B. Secondary Pair 

It is assumed that the secondary transmitter can sense L\ 
channels (L\ < N) and can access = 1 channel in each 
slot. The secondary transmitter can only transmit if the channel 
it chooses to access is sensed to be free. Here, we only report 
our results for the two special cases L\ = N and L\ — 1. The 
more general case will be addressed in the journal version. 

The secondary receiver does not participate in channel 
sensing and is assumed to be capable of accessing only 
one channel [4]. This assumption is intended to limit the 
decoding complexity needed by the secondary receiver. An- 
other motivation behind restricting channel sensing to the 
transmitter is the potentially different sensing outcomes at the 
secondary transmitter and receiver due to the spatial diversity 
of the primary traffic which can lead to the breakdown of the 
secondary transmitter-receiver synchronization. 

Conceptually, our proposed cognitive MAC protocol can be 
decomposed into the following stages: 

• Decision stage: The secondary transmitter decides which 
L\ channels to sense. Also, both transmitter and receiver 
decide which channel to access. 

• Sensing stage: The transmitter senses the L\ selected 
primary channels. 

• Learning stage: The transmitter updates the estimated 
primary channels' statistics, P l . 

• Access stage: If the access channel is sensed to be free, 
a data packet is transmitted to the secondary receiver. 
This packet contains the information needed to sustain 
synchronization between secondary terminals and, hence, 
synchronization does not require a dedicated control 
channel. The length of the packet is assumed to be large 
enough such that the loss of throughput resulting from 
the synchronization overhead is marginal. 



• ACK stage: The receiver sends an ACK to the transmitter 
upon successful reception of sent data. 

The performance of the sensing stage is limited by two 
types of errors. If the secondary transmitter decides that an 
empty channel is busy, it will refrain from transmitting, and 
a spectrum opportunity is overlooked. This is the false alarm 
situation, which is characterized by probability of false alarm 
Pfa- On the other hand, if the detector fails to sense a busy 
channel as busy, a miss detection occurs resulting in interfer- 
ence with primary user. The probability of miss detection is 
denoted by Pmd- In the rest of the paper, Sp denotes the 

state of channel i at time slot j as sensed by the transmitter, 

(i) 

which might not be the actual channel state . Overall, 
successful communication between the secondary transmitter 
and receiver occur only when: 1) they both decide to access 
the same channel, and 2) the channel is sensed to be free and 
is actually free from primary transmissions. 

III. Full Sensing Capability: L 1 = N 

In this section it is assumed that the secondary transmitter 
can sense all N primary channels at the beginning of each 
time slot. The initial packet sent to the receiver includes 
estimates for the transition probabilities, and the belief vector 



fiW, where ft« = [u>[ j) 



J, and uj\ ' is the common 
transmitter's and receiver's estimate of the prior probability 
that channel i is free at the beginning of time slot j, on 
the basis of the sensing history of channel i. Once the 
initial communication is established, the secondary transmitter 
and receiver implement the same spectrum access strategy 
described below for j > 1. 

1) Decision: At the beginning of time slot j, and using 

belief vector QV', the secondary transmitter and receiver 

decide to access channel 

= are max \u 
v; j=i,-,jvL 

2) Sensing: The secondary transmitter senses all channels 
and captures the sensing vector = , 



where S, 
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1 if the ith channel is sensed to be free, 
and Sl~" = if it is found busy. 

3) Learning: Based on the sensing results, the transmitter 
updates the estimates P^ and P{ 1 for all primary 
channels as explained below. 

4) Access: If S\* — 1, the transmitter sends its data packet 
to the receiver. The packet includes <&W, Fq 1 and P^. 
In addition, if the transmission at slot j — 1 has failed, 
the transmitter sends Q^>\ which is the belief vector 
computed at the transmitter based on its observations. If 
the receiver successfully receives the packet, it sends 
an ACK back to the transmitter. Parameter k\P is 
equal to unity if an ACK is received by the transmitter, 
and zero otherwise. If the channel is free, the forward 
transmission and the feedback channel are assumed to 
be error-free. 

5) Finally, the transmitter and receiver update the common 
belief vector Q^ +1 > such that: 
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Pq! and P^j are the most recent shared estimates of /th channel 
transition probabilities. Obviously, in case of perfect sensing, 

At = 1, (7j = and A = 0. 

In addition, the transmitter computes another belief vector, 
f)U+ 1 ) j based on its observations: 
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where Aj, Cj, and Di are the same as Ai, Ci and £)$ with cj 
replaced by Note that fiW = fiW, and {1^+1) differs 
from f2(- ; ' +1 ) only when k[P = 0. If transmission succeeds at 
the y'th time slot after one or more failures, the transmitter and 
receiver set fiO') = Q<J) before computing tt^ +1 \ 

Since we assume that traffic statistics on primary chan- 
nels (P 1 ) are unknown to the secondary users a-priori, the 
secondary users need to estimate these probabilities. When 
continuous observations of each channel are available, each 
channel can be modeled as a hidden Markov model (HMM). 
An optimal learning algorithm for HMM is described in [7] 
using which the transition probabilities, Pfa, an d Paid can 
be estimated. However, we propose a much less complex 
algorithm based on simple counting, which approximates the 
estimated probabilities by the optimal HMM algorithm. The 
algorithm we propose works as follows. After sensing all 
the primary channels at the beginning of each time slot, the 
secondary transmitter keeps track of the following metrics for 
each channel: 



Ml) 



Number of times each channel was sensed to be free: 

TO) = ES\ l 

1=1 

Number of times each channel was sensed to be busy: 



TO) 



E(i 

• Number of state transitions from free to free: 

N{ 1 (j) = ESf5f +1) 
i=i 

• Number of state transitions from busy to free: 

NLU) = E(i--fW +1) 
i=i 

The transition probabilities are estimated: 

pi (a\ _ N oiU) pi (ft _ NjjJJ) 

In order to share channel transition probabilities between 
secondary transmitter and receiver as dictated by the strategy 



for the L x = N case, values of Nf(j), N*(j), N^j) and 
Noi(j) f° r eacn channel are sent within the transmitted packet. 
If K^y — 1, the transmitter and receiver update Poi(j) and 
Pfi(i). Otherwise, the transmitter only updates N{(j) , N^(j), 
Nl^j) and N^j), but uses the old values since the last 
successful transmission in order to determine which channel 
to access at the beginning of a time slot. 

In a nutshell, the proposed algorithm uses the full sensing 
capability of the secondary transmitter to decouple the ex- 
ploration (i.e., learning) task from the exploitation task. After 
an ACK is received, both nodes use the common observation- 
based belief vector to make the optimal access decision. On the 
other hand, in the absence of the ACK, both nodes can not use 
the optimal belief vector in order to maintain synchronization. 
In this case, the proposed algorithm opts for a greedy strategy 
in order to minimize the time between two successive ACKs. 
At this point, we only conjecture the optimality of this strategy 
and continue to work on the proof for the journal version of 
this work. 

As an analytical benchmark, we have the following upper- 
bound on the achievable throughput in this scenario. Assuming 

that the delayed side information of all the primary channels' 

(j—i) 

states SI is given to the secondary transmitter and receiver, 
to decide on the channel to access at time j, an upper bound 
expected throughput per slot is given by: 
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where, P5.1 denotes the state transition probability for 
channel i from state Si — (0, 1) to the free state. Ps t is 
the Markov steady state probability of channel i being free 
or busy. The first term in the summation corresponds to the 
probability that the N channels are in one of the 2 N states, and 
the second term represents the highest expected throughput 
given the current joint state for the N channels. 

A final remark is now in order. Assuming that P^ = Pg 1; 
a channel's probability of being free, Ps i= i, becomes inde- 
pendent of the previous state, i.e., Ps i= i — P\\ = Pqi- In 
this case, the optimal strategy, assuming that the transition 
probabilities are known, is for the secondary transmitter to 
access the channel i* = arg max [Ps i= iPj] and the ex- 

z=l,--- .N 1 

pected throughput becomes max [Ps i= iPj] [9]. Assuming, 

z=l, ••• ,7V z 

however, that the transition probabilities are unknown but both 
nodes know that Pf x = P^, one can estimate each channel's 
free probability Pg i= i as Pg 4= i = N{(j)/j. In Section [v] 
we quantify the value of this side information by comparing 
the performance of this strategy with our universal algorithm 
that does not make any prior assumptions about the transition 
probabilities. 

IV. The Restless Bandit Scenario: Li = 1 

Assuming that the transition probabilities are known a- 
priori by the secondary users, the medium access scenario in 
this case can be formulated as a partially observable Markov 
decision process (POMDP) [5]. The optimal policy, in this 



scenario, must strike a balance between gaining instantaneous 
reward by exploiting channels based on already known infor- 
mation, and gaining information for future use by exploring 
new spectrum opportunities. Motivated by the prohibitive 
computational complexity of the optimal strategy, the authors 
further proposed a reduced complexity strategy based on the 
greedy approach that maximizes the per-slot throughput based 
on already known information (exploitation only) [5]. In a 
more recent work [8], the problem was re-casted as a restless 
bandit problem and the Whittle's index approach was used to 
construct a more efficient medium access policy [10]. 

Here, we relax the assumption of the a-priori known tran- 
sition probabilities by the secondary transmitter/receiver. This 
adds another interesting dimension to the problem since the 
blind cognitive MAC protocol must now learn this statistical 
information on-line in order to make the appropriate access 
decisions. Inspired by previous results of Lai et al. in the 
multi-armed bandit setup [9], we propose the following simple 
strategy. At the beginning of the T slots, each of the N primary 
channels is continuously monitored for an initial learning 
period {LP) to get an estimate for P[ x and Pq V Then, by 
assigning Whittle's index to each channel, we are able to 
choose which channel to access at each time slot. In summary, 
the strategy works as follows. 

1) Initial learning period: Each channel is continuously 
sensed for LP time slots. At the end of the learn- 
ing period, the transition probabilities are estimated as 

pi -^01 pi -^ll 

MH - jv* ' Ml ~ Nl 

2) Decision: At the beginning of any time slot (j > N x 
LP), the secondary transmitter and receiver decide to 
access channel = arg max 



3) Sensing: The secondary transmitter senses channel i* (j). 

4) Learning: if = - 1), update N l xx , N[, Nfo, 



P{ x , and P V 



5) Access: If S^i — 1, the transmitter sends its data packet 
to the receiver. If the receiver successfully receives a 
packet, it sends an ACK back to the transmitter. 

6) The transmitter and receiver calculate fiw+l) given that: 



f?i if i(j) = i*(j),Kl J , 

DiP^ + (1 - Di) jSfe if = r(j),Kg 
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where P\ x and Pq X are the latest successfully shared P\ x and 
Pq X between the secondary transmitter-receiver pair. Finally, 
is used to update Whittle's index xf i+1) of each 
channel as detailed in [8]. 

In the case of time-independent channel states, i.e., P\ x = 
Pq X , the problem reduces to the a multi-armed bandit scenario 
considered in [9]. The difference, here, is the lack of the 
dedicated control channel, between the cognitive transmitter 
and receiver, as assumed in [9]. The following strategy, which 
is applied as soon as the initial synchronization is established, 
avoids this drawback by ensuring synchronization using the 
ACK feedback over the same data channel. 
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Fig. 2. Throughput comparison between: the upper bound from equation J5J, 
the proposed blind strategy proposed for L\ = N, the Whittle index strategy 
for L\ = 1, the greedy strategy for L\ = 1, and the maximum achievable 
offline bound. 



1) Decision: At the beginning of any time slot j, the 
secondary transmitter and receiver decide to access the 
channel 
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X) ' is the number of time slots where 
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successful communication occurs on channel i, and Y> 
is the number of time slots where channel i is chosen 
to sense and access. 

2) Sensing: The secondary transmitter senses channel 

3) Access: If = 1, the transmitter sends its data packet 
to the receiver. If the receiver successfully receives a 
packet, it sends an ACK back to the transmitter. 

4) The transmitter and receiver update the following: 
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V. Numerical Results 

In this section we present simulation results for the two 
scenarios discussed earlier. Throughout this section, we as- 
sume that the number of primary channels N = 5, each 
with bandwidth Bi = 1. The spectrum usage statistics of the 
primary network were assumed to remain unchanged for a 
block of T = 10 4 time slots for Figures H|l andg and for 
a block of T — 10 5 time slots for Figure p] The transition 
probabilities for each channel P\ x and Pq X , were generated 
randomly between 0.1 and 0.9. The plotted results are the 
average over 1000 simulation runs. The discount factor used to 
obtain the Whittle index is 0.9999. In all reported simulations, 
perfect sensing is assumed, and the average throughput per 
time slot is plotted. 

Figure [2] reports the throughput comparison between the 
different cognitive MAC strategies, all with prior knowledge 
about the channels transition probabilities. The loss in through- 
put between the upper bound and the proposed strategy for the 
L\ = N case is shown and the gain offered by the full sensing 
capability as compared with the L x = 1 scenario is apparent. 
It is seen also that the strategies we proposed achieve higher 
throughput than the best offline bound described in [4], in 
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Fig. 3. Throughput comparison between the proposed strategy for (L\ = N) 
with and without known transition probabilities. 
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Fig. 4. Throughput comparison for the blind cognitive MAC protocol (with 
and without the prior knowledge that = Pqj) and the genie-aided 

scenario. 

which the channel with highest steady state probability of be- 
ing free is always chosen. Figure [3] illustrates the convergence 
of the throughput of the proposed blind strategy for L\ = N, 
with no prior information, to the case with prior knowledge 
of the transition probabilities as T grows. In Figure |4] we 
assume that P^ = P^ for all channels. It is shown that even 
if the secondary users are unaware of this fact, and apply 
the proposed strategy, the achievable throughput converges 
asymptotically to the achievable performance when the fact 
that P\ x = Pq 1 is known a-priori, albeit at the expense 
of a longer learning phase. Interestingly, both strategies are 
shown to converge asymptotically to genie-aided upper bound 
(when the transition probabilities are known). Finally, Figure[5] 
demonstrates the tradeoff between the learning time overhead 
in the blind strategy of Section [IV] and the final achievable 
throughput at the end of the T slots. Clearly, this figure 
supports the intuitive conclusion that for large T blocks, one 
can tolerate a longer learning phase in order to maximize the 
steady state achievable throughput. 

VI. Conclusion 

In this work, we propose blind cognitive MAC protocols that 
do not require any prior knowledge about the statistics of the 
primary traffic. We differentiate between two distinct scenar- 
ios, based on the complexity of the cognitive transmitter. In the 
first, the full sensing capability of the secondary transmitter 
is fully utilized to learn the statistics of the primary traffic 
while ensuring perfect synchronization between the secondary 
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Fig. 5. Throughput comparison between the proposed blind strategy for 
(L\ = 1), when LP = 20 and LP = 200, and the genie-aided case. 

transmitter and receiver in the absence of a dedicated con- 
trol channel. The second scenario focuses on low-complexity 
cognitive transmitter capable of sensing one channel only at 
the beginning of each time slot. For this case, we propose 
an augmented Whittle index MAC protocol that allows for an 
initial learning phase to estimate the transition probabilities 
of the primary traffic. Our numerical results demonstrate the 
convergence of the blind protocols performance to that of 
the genie-aided scenario where the primary traffic statistic are 
known a-priori by the secondary transmitter and receiver. 
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